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Abstract 

Satisfiability solvers are increasingly playing a key role in software verification, with particularly effective use 
in the analysis of security vulnerabilities. String processing is a key part of many software applications, such as 
browsers and web servers. These applications are susceptible to attacks through malicious data received over network. 
Automated tools for analyzing the security of such applications, thus need to reason about strings. For efficiency 
reasons, it is desirable to have a solver that treats strings as first-class types. In this paper, we present some theories 
of strings that are useful in a software security context and analyze the computational complexity of the presented 
theories. We use this complexity analysis to motivate a byte-blast approach which employs a Boolean encoding of 
the string constraints to a corresponding Boolean satisfiability problem. 

1 Introduction 

Many security-critical applications such as Web servers routinely process strings as an essential part of their func- 
tionality. They take strings as inputs, screen them using filters, manipulate them and use them for operations such as 
database queries. It is pertinent to verify that these programs do not have vulnerabilities which can be used to compro- 
mise system security. Verification and structured testing techniques to validate security of such applications often rely 
on using constraint solvers. The frequent use of string operations in these applications has motivated several groups 
to explore the possibility of designing a constraint solver which treats strings as first-class types. Such a specialized 
solver for strings would further facilitate the use of constraint solving for analysis of security applications with string 
operations. 

Software applications use various string predicates and functions which are often made available to the developers as 
libraries. A satisfiability solver for string constraints must be able to handle these predicates and functions. From the 
string constraints and predicates available in high level programming languages such as C, JAVA and C-H-, we identify 
a set of core predicates and functions. Many other more complicated string-manipulating functions can be expressed 
as some simple composition of these functions. We use these predicates and functions to define a theory of strings. 

The main contribution of this paper is an analysis of the complexity of several fragments of the theory of strings. We 
show that fairly small and simple-looking fragments are NP-complete. In light of the progress in SAT solving and 
SMT solving for bit-vector arithmetic, these results indicate that a SAT-based approach is reasonable to satisfiability 
solving of string constraints. 
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2 Related Work 



Constraint solvers are widely used in verification and validation of software and hardware systems Q [161 |26l . In 
particular, they have been used extensively for both static ifTOl |4l [3] and dynamic analysis |fT3l [TSll of programs to 
detect malcious code or security vulnerabilities in benign code. The use of constraint solving in software verification 
is driven by development of faster and more scalable SMT solvers for the theory of bit-vectors such as BAT ||22| . 
Boolector 0, Beaver HU MathSat Spear |[T8l, STP HI, UCLID Q and Z3 HD- In particular, UCLID and 
STP have been successfully used for security applications. For example, bit-vector solvers can be used to easily detect 
overflow/underflow errors which are cause of many security vulnerabiUties such as buffer overflow 1241 . 

Analysis of string processing software is an important problem 1271 l25l l28l . This makes it essential to develop verifi- 
cation techniques that can efficiently handle constraints over strings. A scalable approach for solving string constraints 
must treat strings as first class types and string library functions as native operations of the theory of strings fS). De- 
velopment of such a solver for a theory of strings would further facilitate the use of constraint solving for program 
analysis, in general, and security applications, in particular. This will further push the frontier of program analysis in 
terms of scalability as well as program complexity. 

While previous efforts have been made to develop decision procedures for regular expression containment lllTl l9ll. 
there have been some recent efforts to develop an SMT solver for the theory of strings. 

In an independent and parallel work, Kiezun et al l20l have developed a solver (HAMPI) for a theory of strings. 
HAMPI works by reducing the formulae over string constraints to bit-vector logic and then, using a bit-vector solver 
(STP) for checking the satisfiability of the formulae. This reduction is achieved in two steps. HAMPI reduces the 
string constraints specified using a rich input language to a core theory of strings comprising of regular language 
operations and membership predicate. The string constraints in this core language are then translated to bit-vector 
logic before invoking a bit-vector solver. They also show that the satisfiability problem for this theory of strings with 
regular expression operations is NP-complete. 

The string theory considered in this paper is different from the one considered in HAMPI. The string functions and 
predicates in our theory of strings are motivated by commonly used library functions in high level languages such as 
C and Java. The set of constraints expressible in our theory of strings are not comparable with the set of constraints 
expressible in HAMPI. We identify constraints which can be expressed in our theory and not in HAMPI as well as 
those which can be expressed in HAMPI but not in our theory. 

1. Our theory has contains-at-position-i predicate which is true if and only if its first argument string is contained in 
its second argument string at exactly position i. We also have extract-i-j function which extracts a sub-string from 
its string argument using the indices i and j. While the SMT solving approach of HAMPI can be used to handle 
these constraints, the theory of strings considerd in HAMPI is based on regular languages and can not be used to 
encode these constraints. 

2. Our theory does not have union or star operation and hence, constraints with union or star can not be expressed in 
our theory. 

In particular, we note that the NP-completeness result established in Kiezun l20l relies on the use of union operation 
to provide disjunction. We show that even without this union operation, the theory of strings is NP-complete and all 
non-trivial fragments of the theory of strings are also NP-complete. 

Bjorner et al IJ) propose another approach to solving string constraints arising out of path feasibility queries. Their 
approach relies on identifying candidate string lengths by solving length constraints and then, solving the string con- 
straints by considering them to be of lengths found in the first phase. The string lengths found in the first phase may not 
provide a solution even if the formulae is satisfiable and hence, they need to iterate with different length assignments. 
The string operations considered in this work are similar to ones proposed here. We consider strings of bounded 
lengths and we do not consider the replace operation. Hence, our fragment of string theory is decidable. In contrast 
to Bjorner et al's work who presented decidability result for theory of strings, we present complexity results on the 
theory of strings and its different fragments. 
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3 Theory Definition 



The definition of the theory of strings presented in this section is motivated by checking path feasibiUty queries over 
programs written in some high level languages such as Java, C and Ocaml. The string libraries used in these high level 
languages are abstracted as string functions and predicates. We now define the complete theory of strings using these 
predicates and functions in this section. Later, we will analyze the complexity of different fragments of this theory by 
considering different subset of string predicates and functions. 

str-expr ::= c \ s\ str-expr[i : j] \ str-expri@str-expr2 

bool-expr ::= true | false | -^bool-expr 

str-expr^ = str-expr 2 \ str-expr^ □ str-expr2 \ str-expr^ Zl^ str-expr2 

formula ::— bool-expr \ bool-expr A formula 

i,jEN s, Si are string variables c represents a string constant. 



Figure 1: Syntax for String Logic [i : j] denotes extraction of the sub-string starting at position i and ending at 
position j; @ denotes concatenation; □ denotes containment; and 3^ denotes containment at position i. 

The syntax of the statements in theory of strings is given by grammar in Figure [T] The strings are over some finite 
alphabet S. The string constraints arising from software verification involve only finite length strings. The length of a 
string is bounded by the length of the corresponding buffer So, we require that the maximum length of each string is 
bounded by a constant. Also, the maximum length of all strings are less than some constant Lmax- Also, there is an 
empty string constant e. We describe the semantics of the predicates and functions used in the theory definition below. 

String Predicates: The string predicates take two string arguments and evaluate to true or false. 

1. Equality: s; = sj is true if and only if both Si and Sj are assigned the same string constants and otherwise, it is 
false. 

2. Containment at position i: si S2 denotes that S2 is contained in si at position i. For example, bombay contains 
bay at location 4. So, bombay ^4 bay evaluates to true. 

3. Containment: si □ S2 denotes that S2 is contained in si at some position. In particular, the empty string e is 
contained in all strings and does not contain any non-empty string, that is, Vs, s □ e and Vs, s ^ e =5> -i(e □ s). 

String Functions: The two string functions considered in this paper are extraction and concatenation. 

L Extraction: s[i : j] has the type signature str-expr x int x int str-expr. It denotes the substring of s starting 
from position i and ending at position j where i and j are integers. For example, bombay[4 : 6] evaluates to bay. 

2. Concatenation: si@S2 has the type signature str-expr x str-expr str-expr. It denotes the concatenation of the 
two strings provided to it as arguments. For example, bom@bay evaluates to bombay. 

4 Complexity Results 

Before stating and proving the complexity results, we present a brief summary of the results in this section and note that 
all non-trivial fragments of theory of strings are NP-complete. In Theorem[Tl we show that the satisfiability problem 
for the theory of strings as define in Section[3]is in NP. This is a direct consequence of having a constant bound on the 
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size of any string. Hence, the satisfiability of any fragment of string theory is also in NP. Each fragment of theory of 
strings is defined by selecting some string predicates and functions along with Boolean negation and conjunction. As 
discussed in Section[3] there are two functions and three predicates. To define a fragment of string theory we need to 
include atleast one string predicate. 

The three most elementary fragment of string theory are defined by including exactly one string predicate. 

1 . E: This fragment consists of string equality, Boolean negation and conjunction. 

2. C: This fragment consists of string containment. Boolean negation and conjunction. 

3. T: This fragment consists of string containment at position i. Boolean negation and conjunction. 

It is shown that the satisfiability problem for C fragment is NP-complete in Theorem|6] The satisfiability problem for 
T fragment is also NP-complete as shown in Lemma|2] We know that E fragment is polynomial-time solvable using 
congruence closure (T). So, we extend the E (equality) fragment with different string predicates and functions, and 
analyze its complexity. 

1 . E+C: This fragment extends E with string containment. 

2. Eh-T: This fragment extends E with string containment at position i. If i is only allowed to be constant, the corre- 
sponding logic is Eh-T-CONST. 

3. Eh- A: This fragment extends E with string concat function. 

4. E+X: This fragment extends E with string extract function. If the indices for extract are only constant, the corre- 
sponding logic is Eh-X-CONST. 

Since the satisfiability problems for C and T fragments are NP-complete, it is natural that E+C and Eh-T would also 
be NP-complete. We have separately proved the hardness results for both fragments in Theorem|2]and Theorem[3] It 
is also shown in Theorem |4] and Theorem |5] that the satisfiability problem for E+A and E+X fragments are also NP 
complete. 

Any extension of these fragments would also be NP-complete. So, the NP-completeness results for these minimal 
fragments of string theory presented in this section imply that the satisfiability problem for all fragments of string 
theory except for the E (equality) fragment is computationally hard. Thus, it is unlikely that there is any polynomial 
time algorithm for deciding the satisfiability of any non-trivial fragment of string theory unless P=NP. 

In the rest of the section, we state and prove the complexity results. 
Theorem 1 The satisfiability problem over the theory of string is in NP. 

Proof: If the formula over theory of strings is satisfiable, then the satisfying instance is an assignment of string 
variables to strings with lengths upper bounded by the constant L^ax- Hence, the size of the satisfying assignment is 
at most L„iaxN where N is the number of string variables. So, the length of the certificate is polynomial in the size 
of the input and hence, satisfiability of formula in theory of strings is in NP. □ 

As a consequence of the above theorem, the satisfiability of formulae in smaller fragments of theory of strings such as 
E+C, E+T-CONST, E+A and E+X-CONST is also in NP. Hence, we only require to show that satisfiability of formulae 
in these fragments is NP-hard in order to prove that the satisfiabihty problem for these fragments is NP-complete. 

In rest of the section, we state and prove the NP-hardness results for each of these fragments. We show that the 
satisfiability problem for different fragments of string theory is NP-hard. Let us consider a 3-CNF formula <f) over a 
set X = {xi,X2, ■ ■ ■ ,Xn} of n Boolean variables. 

m 

i=l 

where each literal P- is Xk or -^Xk for some Xk G X. We know that 3-CNF-SAT is NP-complete ifTTl . We now reduce 
this problem, that is, finding an assignment of variables in X to {0, 1} such that evaluates to 1, to the problem of 
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finding a satisfying assignment in the corresponding fragment of theory of strings. 



4.1 Equality + Containment (E+C) 

Theorem 2 The satisfiability problem over the theory of strings with equality, contains. Boolean negation and con- 
junction (E+C fragment) is NP-hard. 

Proof: We prove this by reducing 3-CNF-SAT to E+C fragment of theory of strings. We describe a transformation 
that maps a 3-CNF Boolean formula to a formula in E+C fragment of theory of strings (over the alphabet S = {a}) 
such that there is a satisfying assignment for the Boolean formula if and only if there is a satisfying assignment for the 
formula over strings. 

Let tjj be defined as 

■)p{xi) = Si and ip{^Xi) = n 

where Si and are strings of atmost length 1. Si = a if and only if Xi is assigned true, otherwise, it is e. Similarly, 
ri = a if and only if .t,; is assigned false, otherwise it is e. So, for any literal I, would be a if and only if I is 
assigned true. 

We also need to add constraints to ensure consistency, that is, exactly one of Xi or -^Xi is assigned true. For consistency, 
for each variable Xi, we must have the constraint 

^ n 

This ensures that exactly one of Si or is a. 
Each clause c = V Z2 V is transformed to 

Vc^e 

where Vc is a new string variable for clause c and is of length atmost 3. 

Thus, atleast one of must be a which is possible if and only li is assigned true. So, atleast one literal in each 
clause is true. 

A set of string constraints ip{4i) is obtained by applying the above transformations to each clause Cj in 3-CNF Boolean 
formula (j> and taking the union of all the obtained string constraints. 

Let X be a satisfying assignment to (f) such that I{x) denotes the assignment to x. By construction, there is an 
assignment T' to such that T'{si) = a and I'{ri) = e if and only if 2{xi) ~ true. 

Thus, E+C fragment of string theory is NP-hard. □ 

Corollary 1 The satisfiability problem over the theory of strings with equality, contains at i where i is variable. 
Boolean negation and conjunction (E+T-VAR fragment) is NP-hard. 

Proof: stri □ str2 can be rewritten as stri Zl^ str2 where i is a new index variable. Hence, any formula in E+C can 
be expressed as a formula in E+T-VAR fragment. □ 



5 



4.2 Equality + Containment-aT-Constant (E+T-CONST) 

Theorem 3 The satisfiability problem over the theory of strings with contains at constant position, equality. Boolean 
negation and conjunction (E+T-CONST fragment) is NP-hard. 

Proof: We prove this by reducing 3-CNF-SAT to E+T-CONST fragment of theory of strings. We describe a trans- 
formation that maps a 3-CNF Boolean formula to a formula in E+T-CONST fragment of theory of strings (over the 
alphabet S = {a, 6}) such that there is a satisfying assignment for the Boolean formula if and only if there is a satis- 
fying assignment for the formula over strings. 

Let ip be defined as 

i>{xi) = Sj and il^i^x^) = n 

where Si and r.i are strings of atmost length 1. To make it exactly of length 1, we require si ^ t hri ^ t. si = a if 
and only if xi is assigned true, otherwise, it is h. Similarly, = a if and only if Xi is assigned false, otherwise it is h. 
So, for any literal I, ip{l) would be a if and only if I is assigned true. 

We also need to add constraints to ensure consistency, that is, exactly one of Xi or -ix^ is assigned true. For consistency, 
for each variable Xi, we must have the constraint 

Si ^ ri 

This ensures that exactly one of Si or is a. 
Each clause c = li \/ I2 V I3 is transformed to 

Vc □a ^{h) 
Vc ^ bbb 

where Vc is a new variable for clause c and is of length atmost 3. 

Thus, atleast one of ipih) must be of a which is possible if and only if li is assigned true. So, atleast one literal in each 
clause is true. 

A set of string constraints 1/^(0) is obtained by applying the above transformations to each clause Cj in 3-CNF Boolean 
formula (j> and taking the union of all the obtained string constraints. 

Let Z be a satisfying assignment to <j> such that 2{x) denotes the assignment to x. By construction, there is an 
assignment I' to ip{<j>) such that I'{si) = a and 2'{ri) = 6 if and only if I{xi) = true. 

Thus, E+T-CONST fragment of string theory is NP-hard. □ 

Corollary 2 The satisfiability problem over the theory of strings with contains at constant position. Boolean negation 
and conjunction is NP-hard. 

Proof: In the proof above, we can replace ^ ri by -^{si 3i and Vc 7^ bbb by -^{Vc 3i bbbb). The NP-hardness 
proof still goes through. Dis-equality between the strings of same length can be expressed as dis-containment-at-L □ 

4.3 Equality + concAt (E+A) 

Theorem 4 The satisfiability problem over the theory of strings with equality, concat and Boolean conjunction (E+A 
fragment) is NP-hard. 
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Proof: We prove this by reducing 3-CNF-SAT to E+A fragment of theory of strings. We describe a transformation 
that maps a 3-CNF Boolean formula to a formula in E+A fragment of theory of strings (over the alphabet S = {a}) 
such that there is a satisfying assignment for the Boolean formula if and only if there is a satisfying assignment for the 
formula over strings. 
Let be defined as 

il!{xi) = Si and ipi^x^) = n 

where Si and ri are strings of atmost length 1. = a if and only if Xi is assigned true, otherwise, it is e. Similarly, 
ri = a if and only if Xi is assigned false, otherwise it is e. So, for any literal I, would be a if and only if I is 
assigned true. 

We also need to add constraints to ensure consistency, that is, exactly one of Xi or -^Xi is assigned true. For consistency, 
for each variable Xi, we must have the constraint 

Si@ri = a 

This ensures that exactly one of Si or is a, that is, exactly one of ip{xi) or ip{^Xi) is a. 
Each clause /i V ^2 V is transformed to 

^{li)@^{l2)®^{h)@Pi = aaa 

where pi is of length atmost 2. Thus, the sum of the lengths of "ipih) and ^"(^3) must be atleast 1, that is, atleast 

one of ipih) must be a which is possible if and only li is assigned true. So, atleast one literal in each clause is true. 

A set of string constraints is obtained by applying the above transformations to each clause c, in 3-CNF Boolean 
formula (f> and taking the union of all the obtained string constraints. 

Let X be a satisfying assignment to cf) such that I{x) denotes the assignment to x. By construction, there is an 
assignment 2' to ip{4>) such that I'{si) = a and I'{ri) = e if and only if 2{xi) = true. 

Thus, E+A fragment of string theory is NP-hard. 

□ 

Corollary 3 The satisfiability problem over the theory of strings with contains fZlj and concat (C+A fragment) is 
NP-hard. 

Proof: Equality can be expressed with two-way containment. Once again, note that there is no negation in this 
fragment. □ 

4.4 Equality + eXtract-with-constant-indices (E+X-Const) 

Theorem 5 The satisfiability problem over the theory of strings with equality, extract with constant indices, Boolean 
negation and conjunction (E+X-CONST fragment) is NP-hard. 

Proof: We prove this by reducing 3-CNF-SAT to E+X-CONST fragment of theory of strings. We describe a trans- 
formation that maps a 3-CNF Boolean formula to a formula in E+X-CONST fragment of theory of strings (over the 
alphabet E = {a, b]) such that there is a satisfying assignment for the Boolean formula if and only if there is a satis- 
fying assignment for the formula over strings. 
Let if} be defined as 

-0(a;j) = Si and 'ipi^x^) = ri 

where and r.i are strings of atmost length 1. To make it exactly of length 1, we require Si e A ri ^ e. s,; = a if 
and only if Xi is assigned true, otherwise, it is b. Similarly, = a if and only if Xi is assigned false, otherwise it is b. 
So, for any literal /, ip{l) would be a if and only if I is assigned true. 
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We also need to add constraints to ensure consistency, that is, exactly one of Xi or -^Xi is assigned true. For consistency, 
for each variable Xi, we must have the constraint 

Si ^ ri 

This ensures that exactly one of Si or r,; is a. 
Each clause c = V Z2 V /a is transformed to 

Vc[l : 1] = V^i) 

Vc[2 : 2] = ^jih) 
K[3 : 3] ^ i^ih) 
Vc ^ bbb 

where Vc is a new variable for clause c and is of length atmost 3. 

Thus, atleast one of must be of a which is possible if and only li is assigned true. So, atleast one literal in each 
clause is true. 

A set of string constraints ip{(t)) is obtained by applying the above transformations to each clause Cj in 3-CNF Boolean 
formula (j) and taking the union of all the obtained string constraints. 

Let X be a satisfying assignment to cj) such that T{x) denotes the assignment to x. By construction, there is an 
assignment T' to ij{(j>) such that I'{si) = a and I'iri) = 6 if and only if 2{xi) = true. 

Thus, E+X-CONST fragment of string theory is NP-hard. 

□ 

We now show that even without equality, the fragment of the theory of strings having contains as string predicate with 
Boolean negation and conjunction is also hard. This is the final result of the section. 

4.5 Containment (C) 

Theorem 6 Ttie satisfiability problem over the theory of strings with contains, Boolean negation and conjunction ( C 
fragment) is NP-hard. 

Proof: We prove this by reducing 3-CNF-SAT to C fragment of theory of strings. We describe a transformation that 
maps a 3-CNF Boolean formula to a formula in C fragment of theory of strings (over the alphabet S = {a, b}) such 
that there is a satisfying assignment for the Boolean formula if and only if there is a satisfying assignment for the 
formula over strings. 
Let be defined as 

ip{xi) = Si and 'ipi-^x^) = 

where Si and Vi are strings of atmost length 1. To make it exactly of length 1, we require -i(e □ Sj) A-i(e □ r^). Si ~ a 
if and only if Xi is assigned true, otherwise, it is b. Similarly, = a if and only if Xi is assigned false, otherwise it is 
b. So, for any literal /, would be a if and only if / is assigned true. 

We also need to add constraints to ensure consistency, that is, exactly one of Xi or -^Xi is assigned true. For consistency, 
for each variable Xi, we must have the constraint 

^{si □ n) 

Each clause c = V V is transformed to 
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K 3 ^{h) 
-^{bbb □ Vc) 

where Vc is a new variable for clause c and is of length almost 3. 

Thus, atleast one of i^iU) must be of a which is possible if and only U is assigned true. So, atleast one literal in each 
clause is true. 

A set of string constraints ^"(0) is obtained by applying the above transformations to each clause Ci in 3-CNF Boolean 
formula </> and taking the union of all the obtained string constraints. 

Let I he a satisfying assignment to cf) such that 2{x) denotes the assignment to x. By construction, there is an 
assignment 2' to '4'{4') such that T'{si) = a and I'{ri) = 6 if and only if I{xi) = true. 

Thus, C fragment of string theory is NP-hard. □ 

5 Conclusion and Future Work 

The analysis of different fragments of the theory of strings presented in this paper shows that the satisfiability problem 
for even small non-trivial fragments is NP-complete. Thus, it is unlikely that an efficient (polynomial-time) algorithm 
for checking the satisfiability of the strings would be found. Hence, a simple approach based on Boolean encoding of 
string constraints to propositional logic is, in principle, as effective as any other technique for solving string constraints. 
This justifies a "byte-blast" approach to solving string constraints which relies on encoding strings as bit-vectors and 
using an off-the-shelf bit-vector SMT solver Further, these hardness results underline the importance of using domain 
knowledge about string constraints arising out of security applications. We believe, in practice, word-level reasoning 
over strings that exploits such domain knowledge through pragmatic approaches such as abstraction-refinement might 
prove to be very effective in making an efficient and scalable for theory of strings. The key challenge in developing an 
SMT solver for theory of strings is identification of such properties of string constraints arising from real code. 

Inspired by the success of abstraction-refinement based approaches for SMT solving (e.g., 1211 17] [141), we believe 
such an approach would be useful for the theory of strings also. We identify the abstraction techniques that we believe 
would be especially useful in the context of a theory of strings: 

1. Length abstraction: To our knowledge, this approach has been first published by Bjorner et al 121. It operates 
by creating an over-approximation of the actual formula by abstracting each string constraint with a corresponding 
length constraint. The resulting integer linear arithmetic formula is solved to obtain candidate lengths for the strings 
in the original formula, with a possible refinement needed if these candidate lengths turn out to be too small. We 
believe that this general idea can be used but with some guidance to the solver to not simply generate the smallest 
lengths. 

2. Position abstraction: We have observed that, in the security applications of interest, string-containment is a widely 
used predicate and the encoding the choice of position of containment adds significant complexity to the constraint 
satisfaction problem. For large string-lengths, a standard byte-blast approach which reduces the string constraints 
to bit-vector formula would require the SAT solver to branch over a large set of choices of positions. We hypoth- 
esize based on our observations of string constraints generated by colleagues in security applications lH, that the 
position and order of containment of substrings is often not critical to finding a satisfying assigment. Hence, an 
effective approach to construct under-approximation of the string formula would be fixing some heuristic ordering 
of containment constraints. If the formula with this fixed ordering is unsatisfiable, the unsat core generated by the 
SAT solver can be used to selectively refine the ordering. 

The overall approach we envisage will be similar to the iterative construction of over- and under-approximate formulas 
as performed in prior work on model checking (231 and SMT solving for bit-vector arithmetic |7]. It would be 
interesting to evaluate how such an approach based on abstraction-refinement performs for string formulas generated 
in practice from security applications. 
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